Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom identities #4764

Merged
merged 42 commits into from
Apr 10, 2024
Merged

Custom identities #4764

merged 42 commits into from
Apr 10, 2024

Conversation

galvana
Copy link
Contributor

@galvana galvana commented Mar 30, 2024

Closes PROD-1806

Description Of Changes

This change adds the ability to use custom identities in the dataset identity references

collections:
  - name: loyalty
    fields:
      - name: id
        data_categories: [user.unique_id]
        fides_meta:
          identity: loyalty_id

Since the likelihood that a Fides instance will use multiple identities after this change (for example email + customer_id), I also made a few changes in our graph utils to support multiple identities for SaaS connectors.

Code Changes

Privacy Center

  • Updated the PrivacyRequestForm.tsx to be able to render the new custom identities

Admin UI

  • Updated the privacy request table and privacy request detail page to support the identity labels that are now returned from the API

Identity management

  • Removed the providedidentitytype constraint on providedidentity.field_name to allow any custom defined field name
  • Updated the Identity schema to allow extra fields as long as they have the LabeledIdentity type
  • Updated cache_identity/get_cached_identity_data and persist_identity/get_persisted_identity functions to support labeled identities

Task execution

  • Updated pre_process_input_data in graph_task.py to return unique output values (see inline comments)
  • Removed single identity constraint from SaaS connectors

Steps to Confirm

  • Add a custom identity to the access request in fides/data/sample_project/privacy_center/config/config.json
"identity_inputs": {
  "email": "required",
  "loyalty_id": { "label": "Loyalty ID" }
},
  • Run nox -s fides_env(test)
  • Navigate to Systems & Vendors > Cookie House Loyalty Program > Integrations and enable the integration
  • Navigate to the Privacy Center and submit the following access request
Email: [email protected]
Loyalty ID: CH-1
  • Navigate back to the Admin UI and approve the request
  • Verify the presence of data from the postgres_example_test_extended_dataset in the DSR package that is written to fides_uploads

Pre-Merge Checklist

Copy link

vercel bot commented Mar 30, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Ignored Deployment
Name Status Preview Comments Updated (UTC)
fides-plus-nightly ⬜️ Ignored (Inspect) Visit Preview Apr 10, 2024 5:32am

Copy link

cypress bot commented Mar 30, 2024

Passing run #7150 ↗︎

0 4 0 0 Flakiness 0
⚠️ You've recorded test results over your free plan limit.
Upgrade your plan to view test results.

Details:

Merge 98f7c26 into 40cef1a...
Project: fides Commit: 7fbc19b0bc ℹ️
Status: Passed Duration: 00:37 💡
Started: Apr 10, 2024 5:43 AM Ended: Apr 10, 2024 5:44 AM

Review all test suite changes for PR #4764 ↗︎

Copy link

codecov bot commented Apr 1, 2024

Codecov Report

Attention: Patch coverage is 92.37288% with 9 lines in your changes are missing coverage. Please review.

Project coverage is 86.61%. Comparing base (40cef1a) to head (4b448f0).

Files Patch % Lines
src/fides/api/models/privacy_request.py 91.89% 1 Missing and 2 partials ⚠️
src/fides/api/util/collection_util.py 86.36% 1 Missing and 2 partials ⚠️
src/fides/api/schemas/redis_cache.py 95.00% 1 Missing and 1 partial ⚠️
...rc/fides/api/service/connectors/fides_connector.py 50.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4764      +/-   ##
==========================================
- Coverage   86.63%   86.61%   -0.02%     
==========================================
  Files         339      339              
  Lines       20008    20078      +70     
  Branches     2556     2583      +27     
==========================================
+ Hits        17333    17391      +58     
- Misses       2206     2215       +9     
- Partials      469      472       +3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@galvana galvana changed the title First pass of custom identities Custom identities Apr 2, 2024
@adamsachs
Copy link
Contributor

making my way through a code review, but posting UAT testing results here per instructions on the PR (thank you for providing a great test setup!). things are looking good! happy path results:
image

also tested if an invalid custom value is identity value is provided, request still succeeds but gets no results in the extended dataset, as expected:
image

noting that the privacy center form validation is a little bit odd in that it allows you to click "continue" but then gives a validation error, as opposed to the other required fields (i.e. the 'standard' identity fields), which actually grey out/block the user from clicking "continue". but that seems acceptable to me, just wanted to point it out (i'm assuming this is expected!)
image

Copy link
Contributor

@adamsachs adamsachs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@galvana really nice work making this comprehensive update, and thank you for anticipating all the impacts this change may have, including with relaxing the constraint to allow >1 identity value!

most of my comments are relatively minor tweaks, and i'm able to generally follow the identity-management updates well - i think your approach for custom identity support there is clever and it seems pretty robust (a few things around the edges that may help to make it a bit more defensive/less prone to accidental error moving forward)!

in terms of the graph_task updates, i'm following the concrete functionality you've added, i just generally always have a bit of trouble wrapping my head around what pre_process_input_data does generally - it's a pretty weighty method! so i'm feeling a bit less confident in my analysis of those changes. nothing particularly stands out to me as problematic, but it may be good to sync up to understand a bit more concretely how those updates will play in with the overall workflow...

as mentioned above, UAT testing is looking good! so i'm about ready to approve this, perhaps you can just look over my comments and we can align on the graph_task updates, and then we should be good to push this through 👍


from fides.api.custom_types import PhoneNumber
from fides.api.schemas.base_class import FidesSchema

MultiValue = Union[Union[StrictInt, StrictStr], List[Union[StrictInt, StrictStr]]]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: i feel like there's a slightly more idiomatic way to do this - or at least it feels more readable to me. maybe you disagree, or i've got something wrong 😅

Suggested change
MultiValue = Union[Union[StrictInt, StrictStr], List[Union[StrictInt, StrictStr]]]
MultiValue = Union[StrictInt, StrictStr, List[Union[StrictInt, StrictStr]]]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, I was struggling to get these types to work as expected so I completely looked past the double union 😆

Comment on lines +48 to +60
def __init__(self, **data: Any):
for field, value in data.items():
if field not in self.__fields__:
if isinstance(value, LabeledIdentity):
data[field] = value
elif isinstance(value, dict) and "label" in value and "value" in value:
data[field] = LabeledIdentity(**value)
else:
raise ValueError(
f'Custom identity "{field}" must be an instance of LabeledIdentity '
'(e.g. {"label": "Field label", "value": "123"})'
)
super().__init__(**data)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fancy!

nicely done to have this "validation" in the constructor to allow extra fields but still effectively constrain them 👍

Comment on lines +71 to +81
def dict(self, *args: Any, **kwargs: Any) -> Dict[str, Any]:
"""
Returns a dictionary with LabeledIdentity values returned as simple values.
"""
d = super().dict(*args, **kwargs)
for key, value in self.__dict__.items():
if isinstance(value, LabeledIdentity):
d[key] = value.value
else:
d[key] = value
return d
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fancy x2

Comment on lines +40 to +43
{% for identity_type, identity_data in request.identity.items() %}
<div>{{ identity_data.label }}:</div>
<div>{{ identity_data.value }}</div>
{% endfor %}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice, really comprehensive updates to get this working smoothly across the whole stack!

Comment on lines +200 to +201
('CH-1', 'Jane Customer', 100, 'Cookie Rookie'),
('CH-2', 'John Customer', 200, 'Cookie Connoisseur');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just thought it sounded funny 😆 I didn't know about the site

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hahaha i know i also thought it sounded great so i looked it up and was so happy with what i found

Comment on lines 398 to 402
if isinstance(value, dict):
label = value["label"]
value = value["value"]
else:
label = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may be worth making this a bit more defensive, or at least adding in some code comments? i know with the current code this is safe, but i get a little bit concerned about a direct dict lookup like this buried pretty deep in the code causing a tough-to-debug/predict problem later on. i guess the risk would be if we ever evolved to have a proper field on the Identity class with a dict type. but maybe just adding in some extra checks on this side too, similar to as you've done in the Identity constructor (i.e. and "label" in value and "value" in value:)? even throwing a more specific runtime error there to alert developers could help prevent a tricky debug later on...

Suggested change
if isinstance(value, dict):
label = value["label"]
value = value["value"]
else:
label = None
if isinstance(value, dict):
if "label" in value and "value" in value:
label = value["label"]
value = value["value"]
else:
raise RuntimeError(f"Programming error: unexpected dict value '{value}' found in an Identity's `labeled_dict()`!")
else:
label = None

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestion!

for key, value in privacy_request.get_persisted_identity()
.labeled_dict(include_default_labels=True)
.items()
if value["value"] is not None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this not a bit risky? i may be missing something, a bit hard for me to determine the possible values here!

Suggested change
if value["value"] is not None
if value.get("value") is not None

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same reasoning as the other ["value"] access, this should be a dict with label and value keys and want to error if that's not the case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be nice to get a bit of unit test coverage on these new functions specifically?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added tests around the mutability functions

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lovely! thank you, those look great. they also work as a form of documenting the functionality :)

return output
output[FIDESOPS_GROUPED_INPUTS].add(make_immutable(grouped_data))

return make_mutable(output)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok! this is a very helpful explanation. may be good to put a note about this functionality into the method docstring? (it's already a very explanatory docstring :) )

Comment on lines 15 to 16
def make_immutable(obj: Any) -> Any:
if isinstance(obj, dict):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

docstrings here and on make_mutable would be nice!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added docstrings

Copy link
Contributor Author

@galvana galvana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I addressed most of your comments except the one about adding more tests to the dict and labeled_dict functions. I'd like to know what test cases you had in mind.


from fides.api.custom_types import PhoneNumber
from fides.api.schemas.base_class import FidesSchema

MultiValue = Union[Union[StrictInt, StrictStr], List[Union[StrictInt, StrictStr]]]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, I was struggling to get these types to work as expected so I completely looked past the double union 😆

Comment on lines 398 to 402
if isinstance(value, dict):
label = value["label"]
value = value["value"]
else:
label = None
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestion!

Comment on lines 453 to 463
schema = Identity()
for field in self.provided_identities: # type: ignore[attr-defined]
value = field.encrypted_value.get("value")
if field.field_label:
value = LabeledIdentity(label=field.field_label, value=value)
setattr(
schema,
field.field_name.value,
field.encrypted_value["value"],
field.field_name, # type:ignore
value, # type:ignore
)
return schema
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good approach 👍

for key, value in privacy_request.get_persisted_identity()
.labeled_dict(include_default_labels=True)
.items()
if value["value"] is not None
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same reasoning as the other ["value"] access, this should be a dict with label and value keys and want to error if that's not the case.

return output
output[FIDESOPS_GROUPED_INPUTS].add(make_immutable(grouped_data))

return make_mutable(output)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The output dictionary is constructed with deduplicated values for each key, ensuring that the value lists
and the fides_grouped_input list contain only unique elements.

Comment on lines 15 to 16
def make_immutable(obj: Any) -> Any:
if isinstance(obj, dict):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added docstrings

Comment on lines +200 to +201
('CH-1', 'Jane Customer', 100, 'Cookie Rookie'),
('CH-2', 'John Customer', 200, 'Cookie Connoisseur');
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just thought it sounded funny 😆 I didn't know about the site

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added tests around the mutability functions

Copy link
Contributor

@adamsachs adamsachs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking great after the latest updates! discussed some potential further tweaks offline that may be nice, and also getting some FE code review, but this is looking good to go from my end 👍

@galvana galvana requested a review from jpople April 9, 2024 17:50
// extract identity input values
const identityInputValues = Object.fromEntries(
Object.entries(action.identity_inputs ?? {}).map(([key, field]) => {
const value = values[key] || null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we care about handling boolean and number values here? If values[key] is falsy (including false or 0), this syntax will overwrite that with null. If this is just trying to fall back if the value doesn't exist, I would prefer using ??.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, I added explicit checks for undefined and "" before falling back to null.

@galvana galvana requested a review from jpople April 9, 2024 22:39
@galvana galvana merged commit 3acd0ab into main Apr 10, 2024
46 checks passed
@galvana galvana deleted the PROD-1806-custom-identities branch April 10, 2024 05:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants